Skip to content

perf: cache or index TestSuiteGate relevant-test discovery#143

Open
shrutu0929 wants to merge 2 commits intoRefactron-ai:mainfrom
shrutu0929:perf/cache-test-suite-gate
Open

perf: cache or index TestSuiteGate relevant-test discovery#143
shrutu0929 wants to merge 2 commits intoRefactron-ai:mainfrom
shrutu0929:perf/cache-test-suite-gate

Conversation

@shrutu0929
Copy link
Copy Markdown

@shrutu0929 shrutu0929 commented Apr 3, 2026

solve #139
The recent commit introduces a performance optimization for test file discovery within the TestSuiteGate._find_relevant_tests method in order to reduce latency in large repositories. We implemented an in-memory caching mechanism that maps either the project_root or search_root to specific target modules, storing their uniquely associated test files. When a test discovery is requested, the system first consults this cache to immediately yield previously resolved paths. If the cache is unpopulated, it performs the standard file scan but with an added refinement: it intelligently skips scanning obvious non-target directories (such as virtual environments, .pytest_cache, and site-packages) to further minimize disk I/O. The parsed and verified test paths are then saved to the cache, eliminating the overhead of continuously re-reading and parsing the Abstract Syntax Tree (AST) for unchanged code on subsequent executions.

Summary by CodeRabbit

  • New Features
    • Introduced caching for test discovery results to boost performance and reduce scan times
    • Improved test file identification through comprehensive scanning and import analysis

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 3, 2026

📝 Walkthrough

Walkthrough

Added a module-level test discovery cache and implemented a new TestSuiteGate with logic to recursively scan for test files, filter excluded directories, and parse AST to detect module imports. Constructor signature updated to accept search_root parameter.

Changes

Cohort / File(s) Summary
Test Discovery Infrastructure
refactron/verification/checks/test_gate.py
Added _TEST_DISCOVERY_CACHE global for caching discovered test files. Implemented test discovery with directory filtering (venv, site-packages, .git, __pycache__), AST-based import detection, and helper methods _imports_module() and _find_relevant_tests(). Updated TestSuiteGate.__init__() signature to accept search_root parameter.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Poem

🐰 A cache of tests, now swift and keen,
Scanning paths where none were seen,
AST whispers of imports true,
No venv clutter blocks my view! 🌟

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: adding caching/indexing to TestSuiteGate's test discovery for performance improvement.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
refactron/verification/checks/test_gate.py (1)

73-80: 🛠️ Refactor suggestion | 🟠 Major

Integration guidance: Merge caching into the existing class.

To properly implement this PR's caching optimization:

  1. Keep the existing TestSuiteGate(BaseCheck) class as the single definition
  2. Add the _TEST_DISCOVERY_CACHE at module level (after existing imports)
  3. Modify the existing _find_relevant_tests method (line 166) to consult/update the cache
  4. Add the ignore_patterns filtering to the existing method
  5. Keep the existing _imports_module exact-matching logic (line 186)

This preserves API compatibility and maintains the BaseCheck contract.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/verification/checks/test_gate.py` around lines 73 - 80, Keep the
single TestSuiteGate(BaseCheck) class and add a module-level cache named
_TEST_DISCOVERY_CACHE (defined after imports). In the existing
TestSuiteGate._find_relevant_tests method, consult _TEST_DISCOVERY_CACHE keyed
by relevant state (e.g., changed file path or module name) to return cached test
lists when present, and update the cache after discovering tests. Also apply
ignore_patterns filtering inside _find_relevant_tests before returning/updating
the cache. Do not change the exact-matching logic in
TestSuiteGate._imports_module; leave that method as-is.
🧹 Nitpick comments (1)
refactron/verification/checks/test_gate.py (1)

53-58: Cache stores Path objects without invalidation strategy.

The cache stores Path objects that could become stale if test files are added, modified, or deleted between runs. Consider:

  1. Storing modification timestamps and checking for staleness
  2. Adding a TTL-based expiration
  3. Providing a clear_cache() function for explicit invalidation
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/verification/checks/test_gate.py` around lines 53 - 58, The cache
(_TEST_DISCOVERY_CACHE) currently stores Path objects for a
cache_key/target_module pair and can become stale; modify the logic around the
caching in the function that writes/reads _TEST_DISCOVERY_CACHE so that before
returning cached relevant_tests you validate freshness—store file mtimes (or a
hash) alongside the Path entries and compare current mtimes to detect changes,
or attach a TTL timestamp and expire entries when older than the TTL;
additionally add a public clear_cache() helper that empties
_TEST_DISCOVERY_CACHE for explicit invalidation and ensure the function that
populates the cache updates the stored metadata whenever relevant_tests are
recalculated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@refactron/verification/checks/test_gate.py`:
- Around line 42-51: The loop scanning files via search_root.rglob("*.py")
incorrectly applies the ignore_patterns only when project_root is set and only
recognizes test files with a "test_" prefix; update the logic so the ignore
check (any(part in ignore_patterns for part in p.parts)) runs unconditionally
for every candidate from search_root.rglob, and broaden the test-file detection
to accept either p.name.startswith("test_") or p.name.endswith("_test.py"); keep
the existing p.is_file() and _imports_module(p, target_module) checks and append
matching paths to relevant_tests as before.
- Around line 21-27: The import-check currently uses substring matching
(target_module in alias.name and target_module in node.module) which causes
false positives; update the checks in the ast.Import and ast.ImportFrom branches
to use exact-module or submodule matching instead: replace "target_module in
alias.name" with "alias.name == target_module or
alias.name.startswith(target_module + '.')", and replace "target_module in
node.module" with "node.module == target_module or
node.module.startswith(target_module + '.')", keeping the rest of the logic
around ast.Import, ast.ImportFrom, alias.name and node.module unchanged.
- Around line 10-14: Remove the duplicate TestSuiteGate class and merge its
caching behavior into the existing TestSuiteGate that inherits from BaseCheck;
specifically, delete the standalone class definition that defines __init__(self,
search_root: Path, project_root: Optional[Path] = None) and instead incorporate
any cache fields and logic into the existing TestSuiteGate(BaseCheck)
implementation (keep its original constructor signature and API), update its
__init__ to initialize cache using search_root/project_root values without
changing parameters expected by callers, and ensure the existing verify(self,
...) method implements the new caching checks so the class continues to satisfy
the BaseCheck contract used by verification/engine.py.
- Line 59: The module docstring string literal """TestSuiteGate — Check 3: run
relevant tests against transformed code.""" is not at the top of the file; move
that docstring to be the very first statement in the module (before any imports)
and remove the unintended prepended lines that pushed it down (the stray content
added above the imports), ensuring the file begins with the docstring and then
the imports and definitions (e.g., classes/functions in this module such as
TestSuiteGate or any test-related functions).
- Around line 28-30: The code currently swallows all exceptions in the except
Exception: pass block (returning False), which hides parsing/encoding errors;
change the handler to except Exception as e: and log the exception (for example
using logging.exception(...) or a module-level logger.exception(...)) before
returning False so errors during AST parsing in test_gate.py are visible; ensure
the logging call includes context (e.g., "failed to evaluate test gate" and the
exception).
- Around line 1-8: The new caching code was prepended and created duplicate
imports and a second TestSuiteGate definition; remove the duplicated top-level
imports and the extra TestSuiteGate class and integrate the caching logic into
the existing TestSuiteGate class (use the class's existing methods that discover
tests to consult/update the cache named _TEST_DISCOVERY_CACHE rather than a
separate block). Replace the unbounded module-level dict by using
functools.lru_cache or a bounded eviction structure (or wrap cache accesses with
a max-size policy) and ensure all references use the same _TEST_DISCOVERY_CACHE
symbol inside TestSuiteGate so there is a single canonical implementation.

---

Outside diff comments:
In `@refactron/verification/checks/test_gate.py`:
- Around line 73-80: Keep the single TestSuiteGate(BaseCheck) class and add a
module-level cache named _TEST_DISCOVERY_CACHE (defined after imports). In the
existing TestSuiteGate._find_relevant_tests method, consult
_TEST_DISCOVERY_CACHE keyed by relevant state (e.g., changed file path or module
name) to return cached test lists when present, and update the cache after
discovering tests. Also apply ignore_patterns filtering inside
_find_relevant_tests before returning/updating the cache. Do not change the
exact-matching logic in TestSuiteGate._imports_module; leave that method as-is.

---

Nitpick comments:
In `@refactron/verification/checks/test_gate.py`:
- Around line 53-58: The cache (_TEST_DISCOVERY_CACHE) currently stores Path
objects for a cache_key/target_module pair and can become stale; modify the
logic around the caching in the function that writes/reads _TEST_DISCOVERY_CACHE
so that before returning cached relevant_tests you validate freshness—store file
mtimes (or a hash) alongside the Path entries and compare current mtimes to
detect changes, or attach a TTL timestamp and expire entries when older than the
TTL; additionally add a public clear_cache() helper that empties
_TEST_DISCOVERY_CACHE for explicit invalidation and ensure the function that
populates the cache updates the stored metadata whenever relevant_tests are
recalculated.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9b4e40d5-cc44-4733-94a6-658cc1b005ec

📥 Commits

Reviewing files that changed from the base of the PR and between c89cfdb and 03af1c7.

📒 Files selected for processing (1)
  • refactron/verification/checks/test_gate.py

Comment on lines +1 to +8
import ast
from pathlib import Path
from typing import Dict, List, Optional

# Global cache for module -> relevant tests mapping
# Structure: {project_root_str: {target_module: [test_paths]}}
_TEST_DISCOVERY_CACHE: Dict[str, Dict[str, List[Path]]] = {}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: New code was prepended without integrating with existing file content.

The pipeline failures (F811, E402) confirm this file now has duplicate imports and a duplicate TestSuiteGate class definition. The new caching logic in lines 1-58 must be merged into the existing class at line 73, not added as a separate block.

Additionally, the module-level _TEST_DISCOVERY_CACHE is unbounded and will grow indefinitely in long-running processes. Consider adding a max-size eviction policy or using functools.lru_cache.

🧰 Tools
🪛 GitHub Actions: Pre-commit

[error] 1-1: pre-commit hook black failed and modified files (reformatted refactron/verification/checks/test_gate.py).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/verification/checks/test_gate.py` around lines 1 - 8, The new
caching code was prepended and created duplicate imports and a second
TestSuiteGate definition; remove the duplicated top-level imports and the extra
TestSuiteGate class and integrate the caching logic into the existing
TestSuiteGate class (use the class's existing methods that discover tests to
consult/update the cache named _TEST_DISCOVERY_CACHE rather than a separate
block). Replace the unbounded module-level dict by using functools.lru_cache or
a bounded eviction structure (or wrap cache accesses with a max-size policy) and
ensure all references use the same _TEST_DISCOVERY_CACHE symbol inside
TestSuiteGate so there is a single canonical implementation.

Comment on lines +10 to +14
class TestSuiteGate:
def __init__(self, search_root: Path, project_root: Optional[Path] = None):
self.search_root = search_root
self.project_root = project_root

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Duplicate class definition missing BaseCheck inheritance.

This TestSuiteGate class (line 10) conflicts with the existing TestSuiteGate(BaseCheck) at line 73. The new class:

  1. Does not inherit from BaseCheck, violating the verification engine contract
  2. Changes the constructor signature (adds required search_root parameter), breaking API compatibility
  3. Lacks the verify() method required by BaseCheck

The caching logic should be integrated into the existing class at line 73, not defined as a separate class.

Based on learnings: "BaseCheck ABC must have 3 implementations: SyntaxVerifier, ImportIntegrityVerifier, and TestSuiteGate in verification/engine.py"

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/verification/checks/test_gate.py` around lines 10 - 14, Remove the
duplicate TestSuiteGate class and merge its caching behavior into the existing
TestSuiteGate that inherits from BaseCheck; specifically, delete the standalone
class definition that defines __init__(self, search_root: Path, project_root:
Optional[Path] = None) and instead incorporate any cache fields and logic into
the existing TestSuiteGate(BaseCheck) implementation (keep its original
constructor signature and API), update its __init__ to initialize cache using
search_root/project_root values without changing parameters expected by callers,
and ensure the existing verify(self, ...) method implements the new caching
checks so the class continues to satisfy the BaseCheck contract used by
verification/engine.py.

Comment on lines +21 to +27
if isinstance(node, ast.Import):
for alias in node.names:
if target_module in alias.name:
return True
elif isinstance(node, ast.ImportFrom):
if node.module and target_module in node.module:
return True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Bug: Substring matching produces false positives.

Using target_module in alias.name (lines 23, 26) matches any import containing the target as a substring. For example, searching for "util" would incorrectly match "utilities", "string_util", or "my_utility_module".

The existing implementation at line 197-201 correctly uses exact matching:

# Correct approach (from existing code)
if alias.name == module_name or alias.name.startswith(module_name + "."):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/verification/checks/test_gate.py` around lines 21 - 27, The
import-check currently uses substring matching (target_module in alias.name and
target_module in node.module) which causes false positives; update the checks in
the ast.Import and ast.ImportFrom branches to use exact-module or submodule
matching instead: replace "target_module in alias.name" with "alias.name ==
target_module or alias.name.startswith(target_module + '.')", and replace
"target_module in node.module" with "node.module == target_module or
node.module.startswith(target_module + '.')", keeping the rest of the logic
around ast.Import, ast.ImportFrom, alias.name and node.module unchanged.

Comment on lines +28 to +30
except Exception:
pass
return False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid silent exception swallowing.

Catching all exceptions and silently passing hides AST parsing errors, encoding issues, and other problems that would be valuable for debugging. At minimum, log the exception.

Proposed fix
+        import logging
+        logger = logging.getLogger(__name__)
+        # ...
         except Exception:
-            pass
+            logger.debug("Failed to parse %s for import detection", test_file, exc_info=True)
         return False
🧰 Tools
🪛 Ruff (0.15.7)

[error] 28-29: try-except-pass detected, consider logging the exception

(S110)


[warning] 28-28: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/verification/checks/test_gate.py` around lines 28 - 30, The code
currently swallows all exceptions in the except Exception: pass block (returning
False), which hides parsing/encoding errors; change the handler to except
Exception as e: and log the exception (for example using logging.exception(...)
or a module-level logger.exception(...)) before returning False so errors during
AST parsing in test_gate.py are visible; ensure the logging call includes
context (e.g., "failed to evaluate test gate" and the exception).

Comment on lines +42 to +51
for p in self.search_root.rglob("*.py"):
# Skip obvious non-target paths if project_root is set
if self.project_root:
# Check if any parent directory is in the ignore list
if any(part in ignore_patterns for part in p.parts):
continue

if p.is_file() and p.name.startswith("test_"):
if self._imports_module(p, target_module):
relevant_tests.append(p)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Logic issues: Inconsistent filtering and incomplete test file detection.

  1. Inconsistent ignore filtering (lines 44-47): The ignore_patterns filter only applies when project_root is set, but rglob always scans from search_root. This means virtual environments and cache directories are scanned when project_root is None.

  2. Missing test file pattern (line 49): Only checks for test_ prefix, but the existing implementation at line 174 also matches *_test.py suffix:

    if not (name.startswith("test_") or name.endswith("_test.py")):
Proposed fix for test pattern
-            if p.is_file() and p.name.startswith("test_"):
+            if p.is_file() and (p.name.startswith("test_") or p.name.endswith("_test.py")):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/verification/checks/test_gate.py` around lines 42 - 51, The loop
scanning files via search_root.rglob("*.py") incorrectly applies the
ignore_patterns only when project_root is set and only recognizes test files
with a "test_" prefix; update the logic so the ignore check (any(part in
ignore_patterns for part in p.parts)) runs unconditionally for every candidate
from search_root.rglob, and broaden the test-file detection to accept either
p.name.startswith("test_") or p.name.endswith("_test.py"); keep the existing
p.is_file() and _imports_module(p, target_module) checks and append matching
paths to relevant_tests as before.


_TEST_DISCOVERY_CACHE[cache_key][target_module] = relevant_tests
return relevant_tests
"""TestSuiteGate — Check 3: run relevant tests against transformed code."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Module docstring is misplaced due to code prepending.

The module docstring should be at the top of the file (before imports). Its current position confirms that lines 1-58 were incorrectly prepended to the existing file content.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/verification/checks/test_gate.py` at line 59, The module docstring
string literal """TestSuiteGate — Check 3: run relevant tests against
transformed code.""" is not at the top of the file; move that docstring to be
the very first statement in the module (before any imports) and remove the
unintended prepended lines that pushed it down (the stray content added above
the imports), ensuring the file begins with the docstring and then the imports
and definitions (e.g., classes/functions in this module such as TestSuiteGate or
any test-related functions).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant